235 research outputs found

    Motion Estimation by Quadtree Pruning and Merging

    Full text link

    Bamboo: A fast descriptor based on AsymMetric pairwise BOOsting

    Get PDF
    A robust hash, or content-based fingerprint, is a succinct representation of the perceptually most relevant parts of a multimedia object. A key requirement of fingerprinting is that elements with perceptually similar content should map to the same fingerprint, even if their bit-level representations are different. In this work we propose BAMBOO (Binary descriptor based on AsymMetric pairwise BOOsting), a binary local descriptor that exploits a combination of content-based fingerprinting techniques and computationally efficient filters (box filters, Haar-like features, etc.) applied to image patches. In particular, we define a possibly large set of filters and iteratively select the most discriminative ones resorting to an asymmetric pair-wise boosting technique. The output values of the filtering process are quantized to one bit, leading to a very compact binary descriptor. Results show that such descriptor leads to compelling results, significantly outperforming binary descriptors having comparable complexity (e.g., BRISK), and approaching the discriminative power of state-of-the-art descriptors which are significantly more complex (e.g., SIFT and BinBoost)

    Compress-then-analyze vs. analyze-then-compress: Two paradigms for image analysis in visual sensor networks

    Get PDF
    We compare two paradigms for image analysis in vi- sual sensor networks (VSN). In the compress-then-analyze (CTA) paradigm, images acquired from camera nodes are compressed and sent to a central controller for further analysis. Conversely, in the analyze-then-compress (ATC) approach, camera nodes perform visual feature extraction and transmit a compressed version of these features to a central controller. We focus on state-of-the-art binary features which are particularly suitable for resource-constrained VSNs, and we show that the ”winning” paradigm depends primarily on the network conditions. Indeed, while the ATC approach might be the only possible way to perform analysis at low available bitrates, the CTA approach reaches the best results when the available bandwidth enables the transmission of high-quality images

    Detection and identification of sparse audio tampering using distributed source coding and compressive sensing techniques

    Get PDF
    In most practical applications, for the sake of information integrity not only it is useful to detect whether a multimedia content has been modified or not, but also to identify which kind of attack has been carried out. In the case of audio streams, for example, it may be useful to localize the tamper in the time and/or frequency domain. In this paper we devise a hash-based tampering detection and localization system exploiting compressive sensing principles. The multimedia content provider produces a small hash signature using a limited number of random projections of a time-frequency representation of the original audio stream. At the content user side, the hash signature is used to estimate the distortion between the original and the received stream and, provided that the tamper is sufficiently sparse or sparsifiable in some orthonormal basis expansion or redundant dictionary (e.g. DCT or wavelet), to identify the time-frequency portion of the stream that has been manipulated. In order to keep the hash length small, the algorithm exploits distributed source coding techniques

    A visual sensor network for object recognition: Testbed realization

    Get PDF
    This work describes the implementation of an object recognition service on top of energy and resource-constrained hardware. A complete pipeline for object recognition based on the BRISK visual features is implemented on Intel Imote2 sensor devices. The reference implementation is used to assess the performance of the object recognition pipeline in terms of processing time and recognition accuracy

    Briskola: BRISK optimized for low-power ARM architectures

    Get PDF

    Coding binary local features extracted from video sequences

    Get PDF
    Local features represent a powerful tool which is exploited in several applications such as visual search, object recognition and tracking, etc. In this context, binary descriptors provide an efficient alternative to real-valued descriptors, due to low computational complexity, limited memory footprint and fast matching algorithms. The descriptor consists of a binary vector, in which each bit is the result of a pairwise comparison between smoothed pixel intensities. In several cases, visual features need to be transmitted over a bandwidth-limited network. To this end, it is useful to compress the descriptor to reduce the required rate, while attaining a target accuracy for the task at hand. The past literature thoroughly addressed the problem of coding visual features extracted from still images and, only very recently, the problem of coding real-valued features (e.g., SIFT, SURF) extracted from video sequences. In this paper we propose a coding architecture specifically designed for binary local features extracted from video content. We exploit both spatial and temporal redundancy by means of intra-frame and inter-frame coding modes, showing that significant coding gains can be attained for a target level of accuracy of the visual analysis task

    Identification of Sparse Audio Tampering Using Distributed Source Coding and Compressive Sensing Techniques

    Get PDF
    In the past few years, a large amount of techniques have been proposed to identify whether a multimedia content has been illegally tampered or not. Nevertheless, very few efforts have been devoted to identifying which kind of attack has been carried out, especially due to the large data required for this task. We propose a novel hashing scheme which exploits the paradigms of compressive sensing and distributed source coding to generate a compact hash signature, and we apply it to the case of audio content protection. The audio content provider produces a small hash signature by computing a limited number of random projections of a perceptual, time-frequency representation of the original audio stream; the audio hash is given by the syndrome bits of an LDPC code applied to the projections. At the content user side, the hash is decoded using distributed source coding tools. If the tampering is sparsifiable or compressible in some orthonormal basis or redundant dictionary, it is possible to identify the time-frequency position of the attack, with a hash size as small as 200 bits/second; the bit saving obtained by introducing distributed source coding ranges between 20% to 70%

    Energy consumption of visual sensor networks: impact of spatio-temporal coverage

    Get PDF
    Wireless visual sensor networks (VSNs) are expected to play a major role in future IEEE 802.15.4 personal area networks (PANs) under recently established collision-free medium access control (MAC) protocols, such as the IEEE 802.15.4e-2012 MAC. In such environments, the VSN energy consumption is affected by a number of camera sensors deployed (spatial coverage), as well as a number of captured video frames of which each node processes and transmits data (temporal coverage). In this paper we explore this aspect for uniformly formed VSNs, that is, networks comprising identical wireless visual sensor nodes connected to a collection node via a balanced cluster-tree topology, with each node producing independent identically distributed bitstream sizes after processing the video frames captured within each network activation interval. We derive analytic results for the energy-optimal spatiooral coverage parameters of such VSNs under a priori known bounds for the number of frames to process per sensor and the number of nodes to deploy within each tier of the VSN. Our results are parametric to the probability density function characterizing the bitstream size produced by each node and the energy consumption rates of the system of interest. Experimental results are derived from a deployment of TelosB motes and reveal that our analytic results are always within 7%of the energy consumption measurements for a wide range of settings. In addition, results obtained via motion JPEG encoding and feature extraction on a multimedia subsystem (BeagleBone Linux Computer) show that the optimal spatiooral settings derived by our framework allow for substantial reduction of energy consumption in comparison with ad hoc settings

    Motion estimation and signaling techniques for 2D+t scalable video coding

    Get PDF
    We describe a fully scalable wavelet-based 2D+t (in-band) video coding architecture. We propose new coding tools specifically designed for this framework aimed at two goals: reduce the computational complexity at the encoder without sacrificing compression; improve the coding efficiency, especially at low bitrates. To this end, we focus our attention on motion estimation and motion vector encoding. We propose a fast motion estimation algorithm that works in the wavelet domain and exploits the geometrical properties of the wavelet subbands. We show that the computational complexity grows linearly with the size of the search window, yet approaching the performance of a full search strategy. We extend the proposed motion estimation algorithm to work with blocks of variable sizes, in order to better capture local motion characteristics, thus improving in terms of rate-distortion behavior. Given this motion field representation, we propose a motion vector coding algorithm that allows to adaptively scale the motion bit budget according to the target bitrate, improving the coding efficiency at low bitrates. Finally, we show how to optimally scale the motion field when the sequence is decoded at reduced spatial resolution. Experimental results illustrate the advantages of each individual coding tool presented in this paper. Based on these simulations, we define the best configuration of coding parameters and we compare the proposed codec with MC-EZBC, a widely used reference codec implementing the t+2D framework
    • …
    corecore